Background and Problem Statement

PASSNYC – a not-for-profit organization that facilitates a collective impact– is dedicated to broadening educational opportunities for New York City’s talented and underserved students. New York City is home to some of the most impressive educational institutions in the world, yet in recent years, the City’s specialized high schools - institutions with historically transformative impact on student outcomes - have seen a shift toward more homogeneous student body demographics.

PASSNYC uses public data to identify students within New York City’s under-performing school districts and, through consulting and collaboration with partners, aims to increase the diversity of students taking the Specialized High School Admissions Test (SHSAT). By focusing efforts in under-performing areas that are historically underrepresented in SHSAT registration, we will help pave the path to specialized high schools for a more diverse group of students.

With limited time and resources, PASSNYC must be strategic in systemically improving the diversity pipeline and social mobility of certain disadvantaged groups into the specialized schools. The main question to be answered is where will PASSNYC’s investment produce the greatest ROI for the services they offer (after school programs, test preparation, mentoring, or resources for parents)?

Data Dictionary

Data Cleaning

Loading Libraries and Data

Let’s load some libraries we will use.

library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------------------------------------ tidyverse 1.2.1 --
## v ggplot2 3.0.0     v purrr   0.2.5
## v tibble  1.4.2     v dplyr   0.7.6
## v tidyr   0.8.1     v stringr 1.3.1
## v readr   1.1.1     v forcats 0.3.0
## -- Conflicts --------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggplot2)
library(digest)
library(corrplot)
## corrplot 0.84 loaded
library(ggmap)
library(GGally)
## 
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
## 
##     nasa
source("http://www.sthda.com/upload/rquery_cormat.r")

Let’s load our data from the table:

reg <- read_csv("D5 SHSAT Registrations and Testers.csv")
school <- read_csv("2016 School Explorer.csv")

Before we proceed, let’s take a look at the initial data.

head(reg)
## # A tibble: 6 x 7
##   DBN   `School name` `Year of SHST` `Grade level` `Enrollment on ~
##   <chr> <chr>                  <int>         <int>            <int>
## 1 05M0~ P.S. 046 Art~           2013             8               91
## 2 05M0~ P.S. 046 Art~           2014             8               95
## 3 05M0~ P.S. 046 Art~           2015             8               73
## 4 05M0~ P.S. 046 Art~           2016             8               56
## 5 05M1~ P.S. 123 Mah~           2013             8               62
## 6 05M1~ P.S. 123 Mah~           2014             8               62
## # ... with 2 more variables: `Number of students who registered for the
## #   SHSAT` <int>, `Number of students who took the SHSAT` <int>
head(school)
## # A tibble: 6 x 161
##   `Adjusted Grade` `New?` `Other Location~ `School Name` `SED Code`
##   <chr>            <chr>  <chr>            <chr>              <dbl>
## 1 <NA>             <NA>   <NA>             P.S. 015 ROB~    3.10e11
## 2 <NA>             <NA>   <NA>             P.S. 019 ASH~    3.10e11
## 3 <NA>             <NA>   <NA>             P.S. 020 ANN~    3.10e11
## 4 <NA>             <NA>   <NA>             P.S. 034 FRA~    3.10e11
## 5 <NA>             <NA>   <NA>             THE STAR ACA~    3.10e11
## 6 <NA>             <NA>   <NA>             P.S. 064 ROB~    3.10e11
## # ... with 156 more variables: `Location Code` <chr>, District <int>,
## #   Latitude <dbl>, Longitude <dbl>, `Address (Full)` <chr>, City <chr>,
## #   Zip <int>, Grades <chr>, `Grade Low` <chr>, `Grade High` <chr>,
## #   `Community School?` <chr>, `Economic Need Index` <chr>, `School Income
## #   Estimate` <chr>, `Percent ELL` <chr>, `Percent Asian` <chr>, `Percent
## #   Black` <chr>, `Percent Hispanic` <chr>, `Percent Black /
## #   Hispanic` <chr>, `Percent White` <chr>, `Student Attendance
## #   Rate` <chr>, `Percent of Students Chronically Absent` <chr>, `Rigorous
## #   Instruction %` <chr>, `Rigorous Instruction Rating` <chr>,
## #   `Collaborative Teachers %` <chr>, `Collaborative Teachers
## #   Rating` <chr>, `Supportive Environment %` <chr>, `Supportive
## #   Environment Rating` <chr>, `Effective School Leadership %` <chr>,
## #   `Effective School Leadership Rating` <chr>, `Strong Family-Community
## #   Ties %` <chr>, `Strong Family-Community Ties Rating` <chr>, `Trust
## #   %` <chr>, `Trust Rating` <chr>, `Student Achievement Rating` <chr>,
## #   `Average ELA Proficiency` <chr>, `Average Math Proficiency` <chr>,
## #   `Grade 3 ELA - All Students Tested` <int>, `Grade 3 ELA 4s - All
## #   Students` <int>, `Grade 3 ELA 4s - American Indian or Alaska
## #   Native` <int>, `Grade 3 ELA 4s - Black or African American` <int>,
## #   `Grade 3 ELA 4s - Hispanic or Latino` <int>, `Grade 3 ELA 4s - Asian
## #   or Pacific Islander` <int>, `Grade 3 ELA 4s - White` <int>, `Grade 3
## #   ELA 4s - Multiracial` <int>, `Grade 3 ELA 4s - Limited English
## #   Proficient` <int>, `Grade 3 ELA 4s - Economically
## #   Disadvantaged` <int>, `Grade 3 Math - All Students tested` <int>,
## #   `Grade 3 Math 4s - All Students` <int>, `Grade 3 Math 4s - American
## #   Indian or Alaska Native` <int>, `Grade 3 Math 4s - Black or African
## #   American` <int>, `Grade 3 Math 4s - Hispanic or Latino` <int>, `Grade
## #   3 Math 4s - Asian or Pacific Islander` <int>, `Grade 3 Math 4s -
## #   White` <int>, `Grade 3 Math 4s - Multiracial` <int>, `Grade 3 Math 4s
## #   - Limited English Proficient` <int>, `Grade 3 Math 4s - Economically
## #   Disadvantaged` <int>, `Grade 4 ELA - All Students Tested` <int>,
## #   `Grade 4 ELA 4s - All Students` <int>, `Grade 4 ELA 4s - American
## #   Indian or Alaska Native` <int>, `Grade 4 ELA 4s - Black or African
## #   American` <int>, `Grade 4 ELA 4s - Hispanic or Latino` <int>, `Grade 4
## #   ELA 4s - Asian or Pacific Islander` <int>, `Grade 4 ELA 4s -
## #   White` <int>, `Grade 4 ELA 4s - Multiracial` <int>, `Grade 4 ELA 4s -
## #   Limited English Proficient` <int>, `Grade 4 ELA 4s - Economically
## #   Disadvantaged` <int>, `Grade 4 Math - All Students Tested` <int>,
## #   `Grade 4 Math 4s - All Students` <int>, `Grade 4 Math 4s - American
## #   Indian or Alaska Native` <int>, `Grade 4 Math 4s - Black or African
## #   American` <int>, `Grade 4 Math 4s - Hispanic or Latino` <int>, `Grade
## #   4 Math 4s - Asian or Pacific Islander` <int>, `Grade 4 Math 4s -
## #   White` <int>, `Grade 4 Math 4s - Multiracial` <int>, `Grade 4 Math 4s
## #   - Limited English Proficient` <int>, `Grade 4 Math 4s - Economically
## #   Disadvantaged` <int>, `Grade 5 ELA - All Students Tested` <int>,
## #   `Grade 5 ELA 4s - All Students` <int>, `Grade 5 ELA 4s - American
## #   Indian or Alaska Native` <int>, `Grade 5 ELA 4s - Black or African
## #   American` <int>, `Grade 5 ELA 4s - Hispanic or Latino` <int>, `Grade 5
## #   ELA 4s - Asian or Pacific Islander` <int>, `Grade 5 ELA 4s -
## #   White` <int>, `Grade 5 ELA 4s - Multiracial` <int>, `Grade 5 ELA 4s -
## #   Limited English Proficient` <int>, `Grade 5 ELA 4s - Economically
## #   Disadvantaged` <int>, `Grade 5 Math - All Students Tested` <int>,
## #   `Grade 5 Math 4s - All Students` <int>, `Grade 5 Math 4s - American
## #   Indian or Alaska Native` <int>, `Grade 5 Math 4s - Black or African
## #   American` <int>, `Grade 5 Math 4s - Hispanic or Latino` <int>, `Grade
## #   5 Math 4s - Asian or Pacific Islander` <int>, `Grade 5 Math 4s -
## #   White` <int>, `Grade 5 Math 4s - Multiracial` <int>, `Grade 5 Math 4s
## #   - Limited English Proficient` <int>, `Grade 5 Math 4s - Economically
## #   Disadvantaged` <int>, `Grade 6 ELA - All Students Tested` <int>,
## #   `Grade 6 ELA 4s - All Students` <int>, `Grade 6 ELA 4s - American
## #   Indian or Alaska Native` <int>, `Grade 6 ELA 4s - Black or African
## #   American` <int>, ...

Our data looks pretty clean by most standards, but there is work to be done for sure. For example, we’ll need to rename some of our variable names, join our two tables, etc. Let’s move forward with some cleaning.

In order to make our data analysis easier, let’s start cleaning the data. Let’s begin by focusing on reg.

Reg

We begin by renaming the columns we plan to use.

Before continuing, there are three things we need to consider when analyzing our data:

  • PassNYC cares about increasing diversity; therefore need to look at low income, and underrepresented racial minorities (URM) students
  • PassNYC wants to improve the situation through their offerings: after school programs, test preparation, mentoring, and resources for parents
  • Success will be measured by how many student register then take the test, as well as the academic performance of the school as an indicator.

We consider a school to be economically stratified if its economic need as measured by the Economic Need Index1 is more than 10 percentage points from the citywide average. A school can be stratified in either direction - by serving more low-income or more high-income children. New York City reports that 70.6% of schools are economically stratified today.

Exploratory Data Analysis

Demographics

## # A tibble: 6 x 67
##   school DBN   district   lat  long address City    zip commSchool   eni
##   <chr>  <chr>    <int> <dbl> <dbl> <chr>   <chr> <int> <chr>      <dbl>
## 1 P.S. ~ 01M0~        1  40.7 -74.0 333 E ~ NEW ~ 10009 Yes        0.919
## 2 P.S. ~ 01M0~        1  40.7 -74.0 185 1S~ NEW ~ 10003 No         0.641
## 3 P.S. ~ 01M0~        1  40.7 -74.0 166 ES~ NEW ~ 10002 No         0.744
## 4 P.S. ~ 01M0~        1  40.7 -74.0 730 E ~ NEW ~ 10009 No         0.86 
## 5 THE S~ 01M0~        1  40.7 -74.0 121 E ~ NEW ~ 10009 No         0.73 
## 6 P.S. ~ 01M0~        1  40.7 -74.0 600 E ~ NEW ~ 10009 No         0.858
## # ... with 57 more variables: income <dbl>, pctELL <dbl>, pctAttend <dbl>,
## #   pctAbsentChronic <dbl>, pctRigor <dbl>, ratingRigor <chr>,
## #   pctCollab <dbl>, ratingCollab <chr>, pctSupp <dbl>, ratingSupp <chr>,
## #   pctLeader <dbl>, ratingLeader <chr>, pctCommunity <dbl>,
## #   ratingCommunity <chr>, pctTrust <dbl>, ratingTrust <chr>, `Student
## #   Achievement Rating` <chr>, avgELA <dbl>, avgMath <dbl>, elaAll <int>,
## #   elaAll4 <int>, elaBlack <int>, elaHispanic <int>, elaAsian <int>,
## #   elaWhite <int>, mathAll <int>, mathAll4 <int>, mathBlack <int>,
## #   mathHispanic <int>, mathAsian <int>, mathWhite <int>, enroll <dbl>,
## #   registered <dbl>, took <dbl>, regPct <dbl>, tookPct <dbl>,
## #   yield <dbl>, academicScore <dbl>, quantRigor <dbl>, quantCollab <dbl>,
## #   quantSupp <dbl>, quantLeader <dbl>, quantCommunity <dbl>,
## #   quantTrust <dbl>, pctELA4 <dbl>, pctELABlack <dbl>,
## #   pctELAHispanic <dbl>, pctELAAsian <dbl>, pctELAWhite <dbl>,
## #   pctMath4 <dbl>, pctMathBlack <dbl>, pctMathHispanic <dbl>,
## #   pctMathAsian <dbl>, pctMathWhite <dbl>, URM4 <int>, race <fct>,
## #   percent <dbl>

Most NY schools have a low White and Asian population, but a high URM population. Now where are they located?

From our Districts, we can see that most of our students come from Districts 9, 10, 31, 2, and 27.

From the looks of our graphs, schools are racially stratified within NYC. Asian and White students are relatively spread out, but it is clear that they are in more affluent areas of the city (e.g. Manhattan, Staten Island). Interestingly, URMs are racially stratified in where they attend school, with Hispanics mostly in the northern parts of the Bronx, Brooklyn and Queens while Black students are in the heart of Brooklyn and some in Harlem. Now, when we combine our URMs, we can see three clear clusters of where they primarily attend school: the Bronx, Brooklyn-Queens, and a new cluster we did not see before, West New Brighton.

From this exploratory plot, we’re seeing that race and income interact with each other in the form of stratification, so let’s take further look at the ENI of NYC:

As suspected, our schools with a high URM have a high ENI. Let’s further explore:

## 
## Call:
## lm(formula = eni ~ pctBlackHispanic, data = school_clean)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.45278 -0.08124  0.01480  0.08578  0.53042 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       0.26611    0.01011   26.32   <2e-16 ***
## pctBlackHispanic  0.55578    0.01283   43.32   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1333 on 1246 degrees of freedom
##   (25 observations deleted due to missingness)
## Multiple R-squared:  0.601,  Adjusted R-squared:  0.6007 
## F-statistic:  1877 on 1 and 1246 DF,  p-value: < 2.2e-16

As we can see, there is a strong correlation between URM and ENI, and White students and ENI. Whereas the former is positively correlated, the latter is negatively correlated. This, in addition to the .6 correlation beweetn URM and ENI, tells us that we can predict which schools will have a higher economic need if they have a higher URM population.

Academic Performance

Now let’s take a look at academic performance as it is the main thing we’re looking to improve for NYCPASS:

## 
## Call:
## lm(formula = academicScore ~ eni + pctBlackHispanic, data = school_clean)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.25036 -0.30770 -0.06941  0.22148  2.65450 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       7.21051    0.04629  155.78   <2e-16 ***
## eni              -1.78673    0.10400  -17.18   <2e-16 ***
## pctBlackHispanic -1.10934    0.07460  -14.87   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4846 on 1215 degrees of freedom
##   (55 observations deleted due to missingness)
## Multiple R-squared:  0.6523, Adjusted R-squared:  0.6517 
## F-statistic:  1140 on 2 and 1215 DF,  p-value: < 2.2e-16

Economic need is a strong predictor of how well a school will do and it makes sense– the higher the need students have, the lower the scores will be due to resource insecurity. As we’ve found out from before, ENI and race have a relatively strong correlation, but there must be schools that despite their high ENI, produce a high number of 4s.

Examining Academic Performance

##     school              DBN               district          lat       
##  Length:1273        Length:1273        Min.   : 1.00   Min.   :40.51  
##  Class :character   Class :character   1st Qu.: 9.00   1st Qu.:40.67  
##  Mode  :character   Mode  :character   Median :15.00   Median :40.72  
##                                        Mean   :16.13   Mean   :40.73  
##                                        3rd Qu.:24.00   3rd Qu.:40.82  
##                                        Max.   :32.00   Max.   :40.90  
##                                                                       
##       long          address              City                zip       
##  Min.   :-74.24   Length:1273        Length:1273        Min.   :10001  
##  1st Qu.:-73.96   Class :character   Class :character   1st Qu.:10452  
##  Median :-73.92   Mode  :character   Mode  :character   Median :11203  
##  Mean   :-73.92                                         Mean   :10815  
##  3rd Qu.:-73.88                                         3rd Qu.:11232  
##  Max.   :-73.71                                         Max.   :11694  
##                                                                        
##   commSchool             eni             income           pctELL      
##  Length:1273        Min.   :0.0490   Min.   : 16902   Min.   :0.0000  
##  Class :character   1st Qu.:0.5500   1st Qu.: 33610   1st Qu.:0.0400  
##  Mode  :character   Median :0.7310   Median : 43151   Median :0.0900  
##                     Mean   :0.6724   Mean   : 48443   Mean   :0.1248  
##                     3rd Qu.:0.8410   3rd Qu.: 58518   3rd Qu.:0.1700  
##                     Max.   :0.9570   Max.   :181382   Max.   :0.9900  
##                     NA's   :25       NA's   :397                      
##     pctAsian         pctBlack       pctHispanic     pctBlackHispanic
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0200   Min.   :0.0300  
##  1st Qu.:0.0100   1st Qu.:0.0600   1st Qu.:0.1800   1st Qu.:0.4900  
##  Median :0.0400   Median :0.2400   Median :0.3600   Median :0.9000  
##  Mean   :0.1164   Mean   :0.3202   Mean   :0.4115   Mean   :0.7316  
##  3rd Qu.:0.1400   3rd Qu.:0.5600   3rd Qu.:0.6400   3rd Qu.:0.9600  
##  Max.   :0.9500   Max.   :0.9700   Max.   :1.0000   Max.   :1.0000  
##                                                                     
##     pctWhite        pctAttend      pctAbsentChronic    pctRigor     
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0100   1st Qu.:0.9200   1st Qu.:0.1100   1st Qu.:0.8600  
##  Median :0.0300   Median :0.9400   Median :0.2000   Median :0.9000  
##  Mean   :0.1315   Mean   :0.9272   Mean   :0.2159   Mean   :0.8948  
##  3rd Qu.:0.1600   3rd Qu.:0.9500   3rd Qu.:0.3000   3rd Qu.:0.9400  
##  Max.   :0.9200   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##                   NA's   :25       NA's   :25       NA's   :25      
##  ratingRigor          pctCollab      ratingCollab          pctSupp      
##  Length:1273        Min.   :0.0000   Length:1273        Min.   :0.0000  
##  Class :character   1st Qu.:0.8500   Class :character   1st Qu.:0.8400  
##  Mode  :character   Median :0.9000   Mode  :character   Median :0.8900  
##                     Mean   :0.8843                      Mean   :0.8875  
##                     3rd Qu.:0.9400                      3rd Qu.:0.9400  
##                     Max.   :1.0000                      Max.   :1.0000  
##                     NA's   :25                          NA's   :25      
##   ratingSupp          pctLeader      ratingLeader        pctCommunity   
##  Length:1273        Min.   :0.0000   Length:1273        Min.   :0.0000  
##  Class :character   1st Qu.:0.7600   Class :character   1st Qu.:0.8000  
##  Mode  :character   Median :0.8300   Mode  :character   Median :0.8300  
##                     Mean   :0.8161                      Mean   :0.8309  
##                     3rd Qu.:0.8900                      3rd Qu.:0.8700  
##                     Max.   :0.9900                      Max.   :0.9900  
##                     NA's   :25                          NA's   :25      
##  ratingCommunity       pctTrust      ratingTrust       
##  Length:1273        Min.   :0.0000   Length:1273       
##  Class :character   1st Qu.:0.8700   Class :character  
##  Mode  :character   Median :0.9200   Mode  :character  
##                     Mean   :0.9042                     
##                     3rd Qu.:0.9400                     
##                     Max.   :1.0000                     
##                     NA's   :25                         
##  Student Achievement Rating     avgELA         avgMath     
##  Length:1273                Min.   :1.810   Min.   :1.830  
##  Class :character           1st Qu.:2.250   1st Qu.:2.300  
##  Mode  :character           Median :2.450   Median :2.580  
##                             Mean   :2.534   Mean   :2.668  
##                             3rd Qu.:2.760   3rd Qu.:2.980  
##                             Max.   :3.930   Max.   :4.200  
##                             NA's   :55      NA's   :55     
##      elaAll          elaAll4           elaBlack       elaHispanic    
##  Min.   :  0.00   Min.   :  0.000   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.:  0.00   1st Qu.:  0.000   1st Qu.: 0.000   1st Qu.: 0.000  
##  Median :  0.00   Median :  0.000   Median : 0.000   Median : 0.000  
##  Mean   : 52.15   Mean   :  7.317   Mean   : 0.916   Mean   : 1.519  
##  3rd Qu.: 74.00   3rd Qu.:  4.000   3rd Qu.: 0.000   3rd Qu.: 1.000  
##  Max.   :743.00   Max.   :261.000   Max.   :59.000   Max.   :62.000  
##                                                                      
##     elaAsian          elaWhite         mathAll          mathAll4      
##  Min.   :  0.000   Min.   :  0.00   Min.   :  0.00   Min.   :  0.000  
##  1st Qu.:  0.000   1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  0.000  
##  Median :  0.000   Median :  0.00   Median :  0.00   Median :  0.000  
##  Mean   :  2.254   Mean   :  1.93   Mean   : 43.84   Mean   :  4.908  
##  3rd Qu.:  0.000   3rd Qu.:  0.00   3rd Qu.: 59.00   3rd Qu.:  1.000  
##  Max.   :203.000   Max.   :116.00   Max.   :652.00   Max.   :312.000  
##                                                                       
##    mathBlack         mathHispanic       mathAsian         mathWhite       
##  Min.   :  0.0000   Min.   : 0.0000   Min.   :  0.000   Min.   :  0.0000  
##  1st Qu.:  0.0000   1st Qu.: 0.0000   1st Qu.:  0.000   1st Qu.:  0.0000  
##  Median :  0.0000   Median : 0.0000   Median :  0.000   Median :  0.0000  
##  Mean   :  0.6096   Mean   : 0.9466   Mean   :  1.983   Mean   :  0.9701  
##  3rd Qu.:  0.0000   3rd Qu.: 0.0000   3rd Qu.:  0.000   3rd Qu.:  0.0000  
##  Max.   :107.0000   Max.   :71.0000   Max.   :246.000   Max.   :126.0000  
##                                                                           
##      enroll         registered         took            regPct      
##  Min.   : 55.00   Min.   : 4.00   Min.   : 3.429   Min.   :0.0005  
##  1st Qu.: 64.75   1st Qu.:12.38   1st Qu.: 8.237   1st Qu.:0.0016  
##  Median : 79.66   Median :22.62   Median :10.417   Median :0.0029  
##  Mean   : 88.15   Mean   :27.33   Mean   :12.983   Mean   :0.0034  
##  3rd Qu.: 98.88   3rd Qu.:33.41   3rd Qu.:16.188   3rd Qu.:0.0042  
##  Max.   :205.00   Max.   :90.00   Max.   :32.000   Max.   :0.0100  
##  NA's   :1251     NA's   :1251    NA's   :1251     NA's   :1251    
##     tookPct           yield        academicScore     quantRigor   
##  Min.   :0.0004   Min.   :0.2196   Min.   :3.790   Min.   :0.000  
##  1st Qu.:0.0011   1st Qu.:0.4194   1st Qu.:4.550   1st Qu.:2.000  
##  Median :0.0015   Median :0.5499   Median :5.030   Median :3.000  
##  Mean   :0.0016   Mean   :0.5697   Mean   :5.202   Mean   :2.808  
##  3rd Qu.:0.0019   3rd Qu.:0.7162   3rd Qu.:5.720   3rd Qu.:4.000  
##  Max.   :0.0046   Max.   :0.9412   Max.   :8.080   Max.   :4.000  
##  NA's   :1251     NA's   :1251     NA's   :55                     
##   quantCollab      quantSupp      quantLeader   quantCommunity 
##  Min.   :0.000   Min.   :0.000   Min.   :0.00   Min.   :0.000  
##  1st Qu.:3.000   1st Qu.:2.000   1st Qu.:2.00   1st Qu.:2.000  
##  Median :3.000   Median :3.000   Median :3.00   Median :3.000  
##  Mean   :2.989   Mean   :2.853   Mean   :2.72   Mean   :2.535  
##  3rd Qu.:4.000   3rd Qu.:3.000   3rd Qu.:3.00   3rd Qu.:3.000  
##  Max.   :4.000   Max.   :4.000   Max.   :4.00   Max.   :4.000  
##                                                                
##    quantTrust       pctELA4        pctELABlack     pctELAHispanic  
##  Min.   :0.000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:3.000   1st Qu.:0.0303   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :3.000   Median :0.0697   Median :0.0000   Median :0.2000  
##  Mean   :2.901   Mean   :0.1167   Mean   :0.2395   Mean   :0.3249  
##  3rd Qu.:4.000   3rd Qu.:0.1528   3rd Qu.:0.4286   3rd Qu.:0.5714  
##  Max.   :4.000   Max.   :0.8571   Max.   :1.0000   Max.   :1.0000  
##                  NA's   :712      NA's   :768      NA's   :768     
##   pctELAAsian      pctELAWhite        pctMath4    pctMathBlack   
##  Min.   :0.0000   Min.   :0.0000   Min.   :1     Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:1     1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :1     Median :0.0000  
##  Mean   :0.1178   Mean   :0.1158   Mean   :1     Mean   :0.2023  
##  3rd Qu.:0.1600   3rd Qu.:0.1364   3rd Qu.:1     3rd Qu.:0.2614  
##  Max.   :0.8750   Max.   :1.0000   Max.   :1     Max.   :1.0000  
##  NA's   :768      NA's   :768      NA's   :902   NA's   :902     
##  pctMathHispanic   pctMathAsian    pctMathWhite         URM4        
##  Min.   :0.0000   Min.   :0.000   Min.   :0.0000   Min.   :  0.000  
##  1st Qu.:0.0000   1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:  0.000  
##  Median :0.1139   Median :0.000   Median :0.0000   Median :  0.000  
##  Mean   :0.2987   Mean   :0.168   Mean   :0.1052   Mean   :  3.991  
##  3rd Qu.:0.5000   3rd Qu.:0.250   3rd Qu.:0.0033   3rd Qu.:  3.000  
##  Max.   :1.0000   Max.   :1.000   Max.   :1.0000   Max.   :212.000  
##  NA's   :902      NA's   :902     NA's   :902

One way of understanding performance is examining the number of URM students receiving 4s for their exams, but not just any schools, why not target the highest schools? We target schools who fall 1 standard deviation above the mean (top 2.5% of schools) to produce a list of schools we can specifically target with NYCPASS’s programs to further bolster their numbers:

school_2.5 <- school_2.5 %>%
  arrange(desc(count))
school_2.5
## # A tibble: 62 x 2
##    school                                             count
##    <chr>                                              <int>
##  1 SUCCESS ACADEMY CHARTER SCHOOL - HARLEM 1            212
##  2 I.S. 145 JOSEPH PULITZER                             133
##  3 I.S. 73 - THE FRANK SANSIVIERI INTERMEDIATE SCHOOL    93
##  4 M.S. 180 DR. DANIEL HALE WILLIAMS                     57
##  5 I.S. 227 LOUIS ARMSTRONG                              55
##  6 J.H.S. 383 PHILIPPA SCHUYLER                          55
##  7 P.S. 189 THE BILINGUAL CENTER                         51
##  8 I.S. 230                                              50
##  9 SCHOLARS' ACADEMY                                     50
## 10 ACHIEVEMENT FIRST BUSHWICK CHARTER SCHOOL             49
## # ... with 52 more rows

Recommendation 2

### Recommendation 1: Geolocation Strategy Based on the information above, we can see there is a clear corridor of high scores in high need, URM schools. PASSNYC can concentrate marketing and outreach efforts to this area to begin raising awareness for their solutions so to gain the buy-in from parents, teachers, and administrators.

Examining our Hi-Potential diverse schools

## # A tibble: 29 x 70
##    school DBN   district   lat  long address City    zip commSchool   eni
##    <chr>  <chr>    <int> <dbl> <dbl> <chr>   <chr> <int> <chr>      <dbl>
##  1 THE M~ 06M2~        6  40.8 -74.0 71-111~ NEW ~ 10027 No         0.724
##  2 P.S. ~ 08X3~        8  40.8 -73.8 2750 L~ BRONX 10465 No         0.388
##  3 P.S. ~ 15K1~       15  40.7 -74.0 825 4T~ BROO~ 11232 No         0.715
##  4 ROBER~ 24Q5~       24  40.7 -73.9 47-07 ~ LONG~ 11101 No         0.513
##  5 P.S. ~ 25Q1~       25  40.8 -73.8 128-02~ COLL~ 11356 No         0.454
##  6 P.S. ~ 27Q2~       27  40.7 -73.8 84-40 ~ RICH~ 11418 No         0.603
##  7 ALL C~ 32K5~       32  40.7 -73.9 321 PA~ BROO~ 11237 No         0.667
##  8 ACHIE~ 84K5~       32  40.7 -73.9 1300 G~ BROO~ 11237 No         0.718
##  9 CENTR~ 84Q0~       24  40.7 -73.9 55-30 ~ ELMH~ 11373 No         0.688
## 10 SOUTH~ 84X3~       12  40.8 -73.9 977 FO~ BRONX 10459 No         0.756
## # ... with 19 more rows, and 60 more variables: income <dbl>,
## #   pctELL <dbl>, pctAsian <dbl>, pctBlack <dbl>, pctHispanic <dbl>,
## #   pctBlackHispanic <dbl>, pctWhite <dbl>, pctAttend <dbl>,
## #   pctAbsentChronic <dbl>, pctRigor <dbl>, ratingRigor <chr>,
## #   pctCollab <dbl>, ratingCollab <chr>, pctSupp <dbl>, ratingSupp <chr>,
## #   pctLeader <dbl>, ratingLeader <chr>, pctCommunity <dbl>,
## #   ratingCommunity <chr>, pctTrust <dbl>, ratingTrust <chr>, `Student
## #   Achievement Rating` <chr>, avgELA <dbl>, avgMath <dbl>, elaAll <int>,
## #   elaAll4 <int>, elaBlack <int>, elaHispanic <int>, elaAsian <int>,
## #   elaWhite <int>, mathAll <int>, mathAll4 <int>, mathBlack <int>,
## #   mathHispanic <int>, mathAsian <int>, mathWhite <int>, enroll <dbl>,
## #   registered <dbl>, took <dbl>, regPct <dbl>, tookPct <dbl>,
## #   yield <dbl>, academicScore <dbl>, quantRigor <dbl>, quantCollab <dbl>,
## #   quantSupp <dbl>, quantLeader <dbl>, quantCommunity <dbl>,
## #   quantTrust <dbl>, pctELA4 <dbl>, pctELABlack <dbl>,
## #   pctELAHispanic <dbl>, pctELAAsian <dbl>, pctELAWhite <dbl>,
## #   pctMath4 <dbl>, pctMathBlack <dbl>, pctMathHispanic <dbl>,
## #   pctMathAsian <dbl>, pctMathWhite <dbl>, URM4 <int>

When examining the hi-po population, we can see that Districts 4,5,7,8,9,11,23,24, and 27 have hi-po students.

Correlation Matrix

numeric<- na.omit(school_clean)

numeric%>%
  select(eni, income, yield, enroll, regPct, tookPct, starts_with('pct'), starts_with('avg'), starts_with('quant'))%>%
  cor() %>%
  corrplot(type = "upper", method = "square")  
## Warning in cor(.): the standard deviation is zero